Efficient REST API calls in Python via persistent HTTP connections

demitri · May 15, 2020, 2:14pm

Intro

Did you know that the PagerDuty REST API supports HTTP connection reuse? This allows you to make your API calls more efficiently and with less network traffic. Without it, each of the following operations would have to be performed as often as once for each API call:

(If DNS caching isn’t used) make a DNS request to resolve api.pagerduty.com
Establish a TCP connection
Perform a TLS handshake

Each of these prerequisite steps require round-trip network communication and take additional time. Thus, without connection reuse, network latency much more greatly affects the performance of your API calls.

However, by setting the header Connection: keep-alive and using the same network socket for subsequent HTTP requests, the client can avoid going through all the initiation steps in connecting to the REST API in each API call.

Testing your Python code for connection reuse

If you are using Requests, you can check to see if your Python code is properly leveraging the library’s connection-reuse abilities by examining info-level log messages from the underlying urllib3 library. The following code will enable info logging to STDERR and provide such visibility:

import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())
# API calls here

Among output, you should see only once for multiple API calls a message that reads:

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): api.pagerduty.com

Python REST API sessions with automatic connection reuse

Not long ago, we found out that our internal Python-based REST API tools did not reuse connections when making multiple API requests. This had some unfortunate consequences: our scripts for bulk operations were quite slow. Moreover, one time we ran a lot of operations in parallel on a single host, and all threads halted with urllib3.ConnectionError exceptions that indicated an error attempting to resolve the host api.pagerduty.com. Apparently, we had reached the DNS resolver’s rate limit before hitting the PagerDuty REST API rate limit!

To address this issue and eliminate code duplication in our tooling, we created pdpyras, a dead-simple REST API client based on requests.Session. It provides a convenient interface for making basic HTTP requests to the REST API, while enforcing connection reuse.

For more information on this module and how to install and use it, see:

https://pagerduty.github.io/pdpyras

simonfiddaman · May 15, 2020, 2:14pm

This is a really cool library. I revamped existing code I had to use this and it works really well - it practically halved the file size of my script because I don’t have to paginate myself any more!

demitri · May 15, 2020, 2:14pm

@simonfiddaman Excellent, thank you for giving it a try and your kind words/feedback!

Please feel free to let me know your thoughts or desires for enhancements that would expand its utility without bringing too much complexity.

One thing I’m thinking I’ll give users in the next version is the option to customize its behavior with respect to the different error responses. Its behavior is currently hard-coded as follows:

If it’s a 429, retry indefinitely (with increasing cooldown) until it gets through
If it’s a 401, raise an exception (because that pretty much means the token is invalid, and so nothing more can be done)
Anything else: return the reqeusts.Response object

I can foresee some scenarios where this might be useful:

Someone might want to raise an exception for a 403 (which would result from using their user-level API key to do something that they do not have permission to do).
In the case of a 502 (doesn’t happen often / pages PagerDuty engineers when it does), it sometimes can be circumvented by waiting a bit before retrying. One might want the option to automatically back off and try again for a 502 status, with a certain number of maximum retries.

simonfiddaman · May 15, 2020, 2:14pm

They all sound like excellent upgrades, @demitri !

demitri · May 15, 2020, 2:14pm

A little update

After lots of head scratching and keyboard hammering, I feel confident enough in my work at this point to release version 2 of the pdpyras library with some very opinionated improvements that are backwards compatible.

Updated documentation for 2.0 is up on the Github.io page, and the new version can be downloaded from GitHub or installed through PyPI.

New features:

New abstraction for generic CRUD operations, just to save more effort and zap more code duplication
User-configurable retry-on-HTTP-error logic
Minor bugfixes and cleanup here and there

All future updates will be chronicled in the changelog, which (d’oh) I forgot to compile for this release.

Note on deprecation warnings in iter_all and find

In Q1 2019 (even if no new features are added) a new minor version will be released, wherein behavior if encountering a HTTP error in APISession.iter_all and APISession.find will be to raise PDClientError (instead of stopping iteration). To enable this behavior in the current version, and avoid the deprecation warning that will be printed otherwise, set pdpyras.APISession.raise_if_http_error = True.

This change is being made to enforce consistent behavior throughout the API access abstraction methods, and to be true to certain tenets of PEP 20 (the Zen of Python):

Explicit is better than implicit.
…
Errors should never pass silently.
Unless explicitly silenced.

Addendum 2018-10-05

Bugs in the methods APISession.rdelete and raise_on_error have been fixed in a patch release, 2.0.2, out now.

The documentation also has some better examples and includes descriptions of how to use some more of the new features.